Visualization in R - Graphics

John House PhD

March 30th 2024

Visualization in R - Graphics with base R functions and with GGPLOT2

Learning Outcomes

Tidyverse and ggplot2

We’ll use the built-in airquality dataset to illustrate

-   ?airquality for information 
-   Daily air quality measurements in New York, May to September 1973 
airquality %>% head %>% 
  kbl(caption = "First 6 records of the airquality Dataset in R") %>%
  kable_classic_2(full_width = F)
First 6 records of the airquality Dataset in R
Ozone Solar.R Wind Temp Month Day
41 190 7.4 67 5 1
36 118 8.0 72 5 2
12 149 12.6 74 5 3
18 313 11.5 62 5 4
NA NA 14.3 56 5 5
28 NA 14.9 66 5 6
print("Structure of the airquality dataset in R")
[1] "Structure of the airquality dataset in R"
str(airquality)
'data.frame':   153 obs. of  6 variables:
 $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
 $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
 $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
 $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
 $ Month  : int  5 5 5 5 5 5 5 5 5 5 ...
 $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...

Graphics! - Scatterplots

Used to examine the relationship between two (usually numeric) variables

airquality %>% 
  ggplot( ., aes(x = Ozone, y = Temp)) + ## Set up the aesthetics
  geom_point(color = "slateblue4", size = 4, alpha = .3) + ## Specify a point plot
  ggtitle("Relationship between O3 and Temperature") + ## Add in descriptive title for our graph 
  xlab("Ozone (PPM)") + ## Add in descriptive label for our axis
  ylab("Temperature(F)") ## Add in descriptive label for our axis
Warning: Removed 37 rows containing missing values (`geom_point()`).

Things to notice

Base R Scatterplot using plot()

plot(x = airquality$Ozone, 
     y = airquality$Temp,
     col = alpha("slateblue4", 0.4),
     pch = 16,
     main = "Relationship between O3 and Temperature",
     xlab = "Ozone (PPM)",
     ylab = "Temperature(F)")

Back to GGPLOT2 - Scatterplot - Color by Month

airquality %>% 
  mutate(Month = as.factor(Month)) %>% ## Change Month to a Categorical variable 
  ggplot( ., aes(x = Ozone, y = Temp, color = Month)) + ## addin color aesthetic 
  geom_point(size = 4, alpha = .4) +  ## these are actually additional aesthetics added to the geom_point() layer
    ggtitle(label = "Relationship between O3 and Temperature", ## Add Title
            subtitle = "Colored by Month of the Year") + ## Add subtitle
  xlab("Ozone (PPM)") + ## Change X axis label 
  ylab("Temperature(F)") +  ## Change Y axis label
  theme_classic() ## use different theme to change plot appearance 
Warning: Removed 37 rows containing missing values (`geom_point()`).

Scatterplot - Symbol by month (shape)

airquality %>% 
  mutate(Month = as.factor(Month)) %>% ## Change Month to a Categorical variable 
  ggplot( ., aes(x = Ozone, y = Temp, shape = Month)) +
  geom_point(size = 4, alpha = .4) +
    ggtitle(label = "Relationship between O3 and Temperature", 
            subtitle = "Shapes indicate Month") +
  xlab("Ozone (PPM)") +
  ylab("Temperature(F)") +
  theme_classic() ## use different theme to change plot appearance 
Warning: Removed 37 rows containing missing values (`geom_point()`).

Scatterplot - Size by Solar Radiation

airquality %>% 
  mutate(Month = as.factor(Month)) %>% ## Change Month to a Categorical variable 
  ggplot( ., aes(x = Ozone, y = Temp, size = Solar.R, color = Month)) +
  geom_point(alpha = .4) +
    ggtitle(label = "Relationship between O3 and Temperature", 
            subtitle = "Colored by Month of the Year") +
  xlab("Ozone (PPM)") +
  ylab("Temperature(F)") +
  theme_classic() ## use different theme to change plot appearance 
Warning: Removed 42 rows containing missing values (`geom_point()`).

These were all changing of aesthetics for the same geometry - geom_point() which prints a scatterplot

Adding lines through X,Y numerical data with the geom_smooth() geometry

geom_smooth() is a geometry used for drawing lines through the data

airquality %>% 
  ggplot( ., aes(x = Ozone, y = Temp)) +
  geom_smooth() +
  ggtitle(label = "Relationship between O3 and Temperature") +
  xlab("Ozone (PPM)") +
  ylab("Temperature(F)") +
  theme_classic() ## use different theme to change plot appearance 
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 37 rows containing non-finite values (`stat_smooth()`).

Notice that we don’t have the points, just a line through the points

Layering geometries

airquality %>% 
  # mutate(Month = as.factor(Month)) %>% ## Change Month to a Categorical variable 
  ggplot( ., aes(x = Ozone, y = Temp)) +
  geom_point(size = 4, alpha = .4, color = "purple") +
  geom_smooth(alpha = .25) + ## lighten the shading of the standard error shading
  ggtitle(label = "Relationship between O3 and Temperature") +
  xlab("Ozone (PPM)") +
  ylab("Temperature(F)") +
  theme_classic() ## use different theme to change plot appearance 
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 37 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 37 rows containing missing values (`geom_point()`).

Doing the smoothing by month

airquality %>% 
  ggplot( ., aes(x = Ozone, y = Temp, color = as.factor(Month))) +
  geom_smooth() +
  geom_point()
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 37 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 37 rows containing missing values (`geom_point()`).

Doing the smoothing by month and using a linear fit!

airquality %>% 
  mutate(Month = as.factor(Month)) %>% ## Change Month to a Categorical variable 
  ggplot( ., aes(x = Ozone, y = Temp, color = Month)) + ##
  geom_point(size = 4, alpha = .7) +
  geom_smooth(method = lm, se = T, alpha = .2) + ## turn off error bars around the fit to see better
  ggtitle(label = "Relationship between O3 and Temperature",
          subtitle = "Linear Fit Through Data by Each Month") +
  xlab("Ozone (PPM)") +
  ylab("Temperature(F)") +
  theme_classic() ## use different theme to change plot appearance 
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 37 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 37 rows containing missing values (`geom_point()`).

Geometries and Subsetting

airquality %>% 
  ggplot( ., aes(x = Ozone, y = Temp, color = as.factor(Month))) +
  geom_point() + ## This layer uses all data in these two fields
  geom_point(
    data = airquality %>%  filter(Month == 8),  ## For this layer, we pass the same dataset and filter it on the fly!
    color = "red", size = 4, shape = 16) + ## And specify different formatting parameters 
  geom_smooth(data = airquality %>% filter(Month == 8), alpha = .2) +  ## similarly for we pass geom_smooth() a subset of the data 
  ggtitle(label = "Relationship between O3 and Temperature",
          subtitle = "August Data Shown in Red and Fitted") +
  xlab("Ozone (PPM)") +
  ylab("Temperature(F)") +
  theme_classic()
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 5 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 37 rows containing missing values (`geom_point()`).
Warning: Removed 5 rows containing missing values (`geom_point()`).

Barplots and GWAS Dataset

We’ll load the GWAS dataset and do some munging prior to graphing
load("gwas.RData") 

gwas %>% 
  dplyr::filter(CHR_ID %in% as.character(1:22)) %>% ## keep only CHR in 1:22
  mutate(CHR = as.numeric(CHR_ID)) %>% ## create a factor variable for our chromosomes 
  mutate(Year = as.factor(year((DATE)))) %>% 
  group_by(Year) %>% ## group by Year
  summarise(GWAS_Hits_Per_Year = n()) %>% ## Calculate rows of the dataset by year for these 1:22 Chromosomes
  ggplot(data = ., aes(x = Year, y = GWAS_Hits_Per_Year/1000)) +  ## Create bar chart of the findings by year
  geom_bar(position = "stack", stat = "identity", fill = "slateblue4") +
  ggtitle(label = "Variants Published Per Year in GWAS Catalog",subtitle = "in 1000s") +
  geom_text(aes(label=round(GWAS_Hits_Per_Year/1000,1)), position=position_dodge(width=0.9), vjust=-0.25) +
  theme_classic()

Note: We are able to do some calculations within ggplot syntax as seen by dividing the entries/year by 1000 in the aes() and geom_text() statements

using a function cumsum() to specify a special kind of aesthetic

gwas %>% 
  dplyr::filter(CHR_ID %in% as.character(1:22)) %>% ## keep only CHR in 1:22
  mutate(Year = as.factor(year((DATE)))) %>% 
  group_by(Year) %>% ## group by Year and Chromosome 
  summarise(GWAS_Hits_Per_Year = n()) %>% 
  ggplot(data = ., aes(x = Year, y = cumsum(GWAS_Hits_Per_Year))) +  ## Create cumulative summed bar chart through time
    geom_bar(position = "stack", stat = "identity")  ## Stat = identity means use the data given to

Boxplots and Violin Plots

head(diamonds)
# A tibble: 6 × 10
  carat cut       color clarity depth table price     x     y     z
  <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48

How does the variance and spread of price vary by levels of clarity?

Violin plot with geom_violin()

diamonds %>% 
  ggplot(aes(x = price, y = 1)) +
  geom_violin() +
  ggtitle("Dispersion of Price in the diamonds dataset")

Boxplot with geom_boxplot

diamonds %>% 
  ggplot(aes(x = price)) +
  geom_boxplot() +
  ggtitle("Dispersion of Price in the diamonds dataset")

Maybe that is not the whole picture?

What does it look like by clarity code

diamonds %>% 
  ggplot(aes(x = price, fill = clarity, y = clarity)) +
  geom_violin() +
  ggtitle("Dispersion of Price in the diamonds dataset\n by Clarity")

What does it look like by color code

diamonds %>% 
  ggplot(aes(x = price, fill = color, y = color)) +
  geom_violin() +
  ggtitle("Dispersion of Price in the diamonds dataset\n by color of the diamond")

These were the tinyest subset of possible graph types!

Excellent resource for all kinds of plot types with the ggplot2 package: https://r-graph-gallery.com/ggplot2-package.html

Excellent resource for ALL KINDS of plots with and without ggplot2 - https://r-graph-gallery.com/

Plotting Part II!

Large (Real) Gene Expression Dataset

load("gene_expression_data.Rdata")

## Turn genes and treatments into factors
combined_data %>% 
  dplyr::mutate(gene = factor(gene), 
                treatment = factor(treatment, levels = c("Treatment_1" , "Treatment_2" , "Treatment_3" , "Treatment_4" , "Treatment_5" , "Treatment_6" , "Treatment_7" , "Treatment_8" , "Treatment_9" , "Treatment_10" ,"Treatment_11"))) -> combined_data


head(combined_data, 3) %>% 
  kbl() %>%
  kable_styling()
external_gene_name description gene log2FoldChange padj treatment
Gnai3 guanine nucleotide binding protein (G protein), alpha inhibiting 3 [Source:MGI Symbol;Acc:MGI:95773] ENSMUSG00000000001 -0.0064279 0.9775395 Treatment_1
Cdc45 cell division cycle 45 [Source:MGI Symbol;Acc:MGI:1338073] ENSMUSG00000000028 -0.1222427 0.8407751 Treatment_1
Scml2 sex comb on midleg-like 2 (Drosophila) [Source:MGI Symbol;Acc:MGI:1340042] ENSMUSG00000000037 -0.8846358 0.1358723 Treatment_1
summary(combined_data)
 external_gene_name description                        gene       
 Length:202106      Length:202106      ENSMUSG00000000001:    11  
 Class :character   Class :character   ENSMUSG00000000028:    11  
 Mode  :character   Mode  :character   ENSMUSG00000000037:    11  
                                       ENSMUSG00000000049:    11  
                                       ENSMUSG00000000056:    11  
                                       ENSMUSG00000000058:    11  
                                       (Other)           :202040  
 log2FoldChange           padj              treatment    
 Min.   :-6.517646   Min.   :0.0000   Treatment_1:18374  
 1st Qu.:-0.250569   1st Qu.:0.4188   Treatment_2:18374  
 Median : 0.000024   Median :0.7943   Treatment_3:18374  
 Mean   : 0.001778   Mean   :0.6694   Treatment_4:18374  
 3rd Qu.: 0.249589   3rd Qu.:0.9762   Treatment_5:18374  
 Max.   : 8.768892   Max.   :1.0000   Treatment_6:18374  
                                      (Other)    :91862  
length(unique(combined_data$gene))
[1] 18375
unique(combined_data$treatment)
 [1] Treatment_1  Treatment_2  Treatment_3  Treatment_4  Treatment_5 
 [6] Treatment_6  Treatment_7  Treatment_8  Treatment_9  Treatment_10
[11] Treatment_11
11 Levels: Treatment_1 Treatment_2 Treatment_3 Treatment_4 ... Treatment_11

Data Munging

How many differentially expressed genes with adjusted p.value < 0.10?

combined_data %>% 
  dplyr::filter(padj < 0.10) %>% 
  group_by(treatment) %>% 
  summarize(DEGs = n()) %>% 
  kbl(caption = "Differentially Expressed Genes with FDR <= 10%") %>% 
  kable_minimal(full_width= F)
Differentially Expressed Genes with FDR <= 10%
treatment DEGs
Treatment_1 7497
Treatment_2 481
Treatment_3 890
Treatment_4 7546
Treatment_5 748
Treatment_6 2475
Treatment_7 292
Treatment_8 552
Treatment_9 501
Treatment_10 456
Treatment_11 1915

What are the top 10 genes with FDR 10% or less across all 11 treatments??

combined_data %>% 
  dplyr::filter(padj < 0.10) %>% 
  group_by(external_gene_name, description) %>% 
  summarise(Treatments_with_Significance = n()) %>% 
  arrange(desc(Treatments_with_Significance)) -> top_DEGs
`summarise()` has grouped output by 'external_gene_name'. You can override
using the `.groups` argument.
head(top_DEGs, 10) %>% 
  kbl(caption = "Top 10 DEGs across Treatment Space") %>% 
  kable_minimal(full_width= F)
Top 10 DEGs across Treatment Space
external_gene_name description Treatments_with_Significance
Aaas achalasia, adrenocortical insufficiency, alacrimia [Source:MGI Symbol;Acc:MGI:2443767] 9
Dtx3l deltex 3-like, E3 ubiquitin ligase [Source:MGI Symbol;Acc:MGI:2656973] 9
Fbxw9 F-box and WD-40 domain protein 9 [Source:MGI Symbol;Acc:MGI:1915878] 9
Irgm2 immunity-related GTPase family M member 2 [Source:MGI Symbol;Acc:MGI:1926262] 9
Oas1g 2’-5’ oligoadenylate synthetase 1G [Source:MGI Symbol;Acc:MGI:97429] 9
Olah oleoyl-ACP hydrolase [Source:MGI Symbol;Acc:MGI:2139018] 9
Sp100 nuclear antigen Sp100 [Source:MGI Symbol;Acc:MGI:109561] 9
Tmem229b transmembrane protein 229B [Source:MGI Symbol;Acc:MGI:2444389] 9
Bloc1s2 biogenesis of lysosomal organelles complex-1, subunit 2 [Source:MGI Symbol;Acc:MGI:1920939] 8
Ccng1 cyclin G1 [Source:MGI Symbol;Acc:MGI:102890] 8

Create some individual graphs

For the top 6 genes, what do the log2fc values look like across treatments?

Let’s write a function!

jsh_l2fc_plot <- function(genetofilter = "Aaas", ## gene we want to graph
                          color = "slateblue"){  ## color of the bars on our graph
  combined_data %>% ## Take large dataset
    dplyr::filter(external_gene_name == genetofilter) %>% ## filter to specified gene 
    mutate(padj = formatC(padj, format = "e", digits = 2)) -> mydata ## Make a cleaner printed value of our adjusted pvalue 
  
  ## Get minimum l2fc for correct placement of text we'll use later in geom_label()
  min_l2fc <- min(mydata$log2FoldChange)
  
  mydata %>% 
    ggplot(aes(y = log2FoldChange, x = treatment)) + ## set aesthetics 
    geom_bar(stat = "identity", fill = color, color = "black") + ## call a bar plot and tell it to use actual values instead of trying to "count" 
    ylab("Log 2 Fold Change") + ## label Y axis
    xlab("") + ## label X axis 
    ylim(min_l2fc*1.6, max(1.1*mydata$log2FoldChange)) + ## Set limits based on the miniumum l2fc value
    geom_text(aes(label = padj, y = 1.4*min_l2fc), size = 2) + ## Add text to our plot based on that minimum!
    coord_flip() + ## Flip the graph to be horizontal bars
    theme_bw() + ## clean theme
    ### NOTICE WE CAN HAVE THE TITLE CHANGE BASED ON A PASSED PARAMETER!
    ggtitle(paste0("Expression Change Across Treatments\nGene = ", genetofilter), subtitle = "Adjusted Pvalue is printed on left") -> p1 
  return(p1)  ## return object from ggplot2
}

## Default Plot
jsh_l2fc_plot()

## Plot with different gene and color
jsh_l2fc_plot(genetofilter = "Fbxw9", color = "orange")

Calling our Function

## Define a vector with our genes to call in our function 
my_top10_genes <- top_DEGs %>% 
  pull(external_gene_name) %>% ## Create Vector 
  .[1:10] ## pull the first 10 items - the "." acts as the "input" from the previous operation in the piping %>% 
my_top10_genes
 [1] "Aaas"     "Dtx3l"    "Fbxw9"    "Irgm2"    "Oas1g"    "Olah"    
 [7] "Sp100"    "Tmem229b" "Bloc1s2"  "Ccng1"   
## lets create our colors from https://colorbrewer2.org/#type=diverging&scheme=PRGn&n=9
mycolors <- c('#762a83','#9970ab','#c2a5cf','#e7d4e8','#fdb863','#d9f0d3','#a6dba0','#5aae61','#1b7837')

mygraphs <- list()  ## create empty list to store our graphs into 

for (i in seq_along(my_top10_genes)){
  mygraphs[[i]] <- jsh_l2fc_plot(genetofilter = my_top10_genes[i], color = mycolors[i])
}

mygraphs[2]
[[1]]

mygraphs[6]
[[1]]

We can use the ggpubr package to arrange multiple plots

library(ggpubr)
ggarrange(plotlist = list(mygraphs[[1]], mygraphs[[2]], mygraphs[[3]], mygraphs[[4]]))

The Titles Are Too Big For The Plot Area

Customizing GGPlots with theme elements

We can use theme settings to customize many of the features of our plots

GGPLOT2_REFERENCE

theme( line, rect, text, title, aspect.ratio, axis.title, axis.title.x, axis.title.x.top, axis.title.x.bottom, axis.title.y, axis.title.y.left, axis.title.y.right, axis.text, axis.text.x, axis.text.x.top, axis.text.x.bottom, axis.text.y, axis.text.y.left, axis.text.y.right, axis.ticks, axis.ticks.x, axis.ticks.x.top, axis.ticks.x.bottom, axis.ticks.y, axis.ticks.y.left, axis.ticks.y.right, axis.ticks.length, axis.ticks.length.x, axis.ticks.length.x.top, axis.ticks.length.x.bottom, axis.ticks.length.y, axis.ticks.length.y.left, axis.ticks.length.y.right, axis.line, axis.line.x, axis.line.x.top, axis.line.x.bottom, axis.line.y, axis.line.y.left, axis.line.y.right, legend.background, legend.margin, legend.spacing, legend.spacing.x,
legend.spacing.y, legend.key, legend.key.size, legend.key.height, legend.key.width, legend.text, legend.text.align, legend.title, legend.title.align, legend.position, legend.direction, legend.justification,
legend.box, legend.box.just, legend.box.margin, legend.box.background, legend.box.spacing, panel.background, panel.border, panel.spacing, panel.spacing.x, panel.spacing.y, panel.grid, panel.grid.major, panel.grid.minor, panel.grid.major.x, panel.grid.major.y, panel.grid.minor.x, panel.grid.minor.y, panel.ontop, plot.background, plot.title, plot.title.position, plot.subtitle, plot.caption, plot.caption.position, plot.tag, plot.tag.position, plot.margin, strip.background, strip.background.x, strip.background.y, strip.clip, strip.placement, strip.text, strip.text.x, strip.text.x.bottom, strip.text.x.top, strip.text.y, strip.text.y.left, strip.text.y.right, strip.switch.pad.grid, strip.switch.pad.wrap )

For titles, theme(plot.title = element_text(size = xx))

mygraphs[[1]] <- mygraphs[[1]] + 
  theme(plot.title = element_text(size = 9)) 

ggarrange(plotlist = list(mygraphs[[1]], mygraphs[[2]], mygraphs[[3]], mygraphs[[4]]))

for(i in seq_along(mygraphs)){
  mygraphs[[i]] <- mygraphs[[i]] +
    theme(plot.title = element_text(size = 9)) 
}

ggarrange(plotlist = list(mygraphs[[1]], mygraphs[[2]], mygraphs[[3]], mygraphs[[4]]))  

This still doesn’t look great. We can either:

Change some more theme elements

for(i in seq_along(mygraphs)){
  mygraphs[[i]] <- mygraphs[[i]] +
    theme(axis.text.y = element_text(angle = 45, size = 5))
}

ggarrange(plotlist = list(mygraphs[[6]], mygraphs[[7]], mygraphs[[8]], mygraphs[[9]]))  

Change size of plotting Area

# In the r coding chunk options   ```{r, fig.height = 7, fig.width=12}
ggarrange(plotlist = list(mygraphs[[6]], mygraphs[[7]], mygraphs[[8]], mygraphs[[9]]))  

Change Layout of Plotting Area (And size)

## ```{r fig.height=9, fig.width = 6} 
ggarrange(plotlist = list(mygraphs[[6]], mygraphs[[7]], mygraphs[[8]], mygraphs[[9]]),
          ncol = 1)

Shorten Treatment instead

combined_data  %>% 
  rowwise() %>% 
  mutate(treatment = unlist(str_split(treatment, "_"))[2]) %>% 
  mutate(treatment = factor(treatment)) %>% 
  ungroup() -> combined_data

for (i in seq_along(my_top10_genes)){
  mygraphs[[i]] <- jsh_l2fc_plot(genetofilter = my_top10_genes[i], color = mycolors[i]) + xlab("Treatment")
}

ggarrange(plotlist = list(mygraphs[[6]], mygraphs[[7]], mygraphs[[8]], mygraphs[[9]]))

Using facet_wrap() in ggplot2

iris %>% ggplot(aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
  geom_point(size = 4) +
  ggtitle("Sepal Width vs. Height in Iris Species") +
  geom_smooth() +
  theme_classic() +
  theme(legend.position="bottom")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Using facet_wrap() in ggplot2

Spreading across Species into Separate Plots

iris %>% ggplot(aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
  geom_point(size = 4) +
  ggtitle("Sepal Width vs. Height in Iris Species") +
  geom_smooth() +
  theme_classic() +
  theme(legend.position="bottom") +
  facet_wrap(~Species)
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Using facet_wrap() in ggplot2

Can have the plots stacked instead

iris %>% ggplot(aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
  geom_point(size = 4) +
  ggtitle("Sepal Width vs. Height in Iris Species") +
  geom_smooth() +
  theme_classic() +
  theme(legend.position="bottom") +
  facet_wrap(~Species, ncol = 1)
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Using facet_wrap() in ggplot2

We can also arrange across two variables

head(mpg,5) %>% kbl %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = F)
manufacturer model displ year cyl trans drv cty hwy fl class
audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
audi a4 2.0 2008 4 manual(m6) f 20 31 p compact
audi a4 2.0 2008 4 auto(av) f 21 30 p compact
audi a4 2.8 1999 6 auto(l5) f 16 26 p compact
mpg %>% 
  ggplot(aes(x = cty, y = displ, color = manufacturer)) +
  geom_point(size =3, alpha = .7) +
  stat_smooth(method = "lm", ## linear model (linear regression)
              formula = y ~ x, ## regression line based on the aes() values
              se = F,
              color = "darkorchid4") + ## Don't print error bands
  ggtitle("Relationship Between Displacement and City MPG\nin the mpg Dataset") +
  xlab("City Miles Per Gallon") +
  ylab("Engine Displacement in Liters") +
  theme_dark() -> myplot
myplot  

Arranging across two variables

mpg %>% 
  mutate(drive_train = case_when(drv == "4" ~ "4WD", ## Use case_when from dplyr package to create new variable based on levels of drv
                                 drv == "f" ~ "FWD",
                                 drv == "r" ~ "RWD")) %>% 
  mutate(drive_train = factor(drive_train, levels = c("FWD","RWD","4WD")))  %>% ## Factor that new variable and set factor order!
  ggplot(aes(x = cty, y = displ, color = manufacturer)) +
  geom_point(size =3, alpha = .7) +
  stat_smooth(method = "lm", ## linear model (linear regression)
              formula = y ~ x, ## regression line based on the aes() values
              se = F,
              color = "darkorchid4") + ## Don't print error bands
  ggtitle("Relationship Between Displacement and City MPG\nBy Cylinder Number and Drive Train Type") +
  xlab("City Miles Per Gallon") +
  ylab("Engine Displacement in Liters") +
  facet_grid(cyl~drive_train) +
  theme_dark() -> myplot2
myplot2

Plotly!

Plotly allows the creation of interactive graphics and uses ggplot objects!

Plotly

library(plotly) ## Load our library

mpg %>% 
  mutate(mylabel = paste0(year,":",manufacturer,":",model)) %>% 
  ggplot(aes(x = cty, y = displ, color = manufacturer, label = mylabel)) +
  geom_point(size = 3, alpha = .7) +
  stat_smooth(method = "lm", ## linear model (linear regression)
              formula = y ~ x, ## regression line based on the aes() values
              se = F, ## Don't print error bands
              color = "darkorchid4") + 
  ggtitle("Relationship Between Displacement and City MPG in the mpg Dataset") +
  xlab("City Miles Per Gallon") +
  ylab("Engine Displacement in Liters") +
  theme_dark() -> myplot
ggplotly(myplot) -> gplot.object 
gplot.object

We can also ouput this widget!

library(htmlwidgets)
saveWidget(gplot.object,"gplot.object.html", selfcontained = T)